-
Notifications
You must be signed in to change notification settings - Fork 819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding an API for decode streaming. #1677
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, missing a Python API + example of how to use in python!
pub fn decode_stream(&self, skip_special_tokens: bool) -> DecodeStream<'_, M, N, PT, PP, D> { | ||
DecodeStream::new(self, skip_special_tokens) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd rather we explicitly create this DecodeStream
with DecodeStream::new(tokenizer, ...)
without adding this to the tokenizer funcs!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you wish, this follows the .iter()
pattern in regular rust as it's more convient given the lifetime bound of the DecodeStream
object.
https://doc.rust-lang.org/src/alloc/collections/vec_deque/mod.rs.html#1204
It's really just sugard, I can happily remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. No sounds good, was more thinking about the coming python api as well but in rust makes sense for sure
self.prefix_index = new_prefix_index; | ||
Ok(Some(new_text.to_string())) | ||
} else { | ||
Ok(None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
returning '�'
might be more expected (at least it's not None so people can print it still?) or ''
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it's wrong to do so. Invalid utf-8 is perfectly normal and should not be returned before enough token are accumulated (see the accent example) to provide valid utf-8. If valid utf-8 follows invalid utf-8 then both will be returned at the same time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Producing "" is also wrong, since the token really didn't produce anything, not the empty string.
} | ||
let new_text = &string[self.prefix.len()..].to_string(); | ||
let new_prefix_index = self.ids.len() - self.prefix_index; | ||
self.ids = self.ids.drain(self.read_index..).collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, the state is bound to be quite small with this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what we have in TGI, the overhead is indeed quite low. You're decoding twice as much (prefix + new text) and you have only a handful of extra tokens.
If you're OK with that, I'd keep 2 separate PRs to keep this PR with logic small enough. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
let string = self | ||
.tokenizer | ||
.decode(self.ids.as_slice(), self.skip_special_tokens)?; | ||
if string.len() > self.prefix.len() && !string.ends_with('�') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into streamed decoding issues as well and had the same solution in mind. However, I came to the conclusion that this solution has its own flaw: If you want to actually decode the �
character because it is part of the completion, this code would assume it's an incomplete UTF-8 marker and not yield anything.
The only clean solution is to offer a Decoder::decode_u8
method. I could help out here, if desired.
Fixes #1666